21 research outputs found
Recommended from our members
Extracting and re-using research data from chemistry e-theses: the SPECTRa-T project
Scientific e-theses are data-rich resources, but much of the information they contain is not readily accessible. For chemistry, the SPECTRa-T project has addressed this problem by developing data-mining techniques to extract experimental data, creating RDF (Resource Description Framework) triples for exposure to sophisticated Semantic Web searches.
We used OSCAR3, an Open Source chemistry text-mining tool, to parse and extract data from theses in PDF, and from theses in Office Open XML document format.
Theses in PDF suffered data corruption and a loss of formatting that prevented the identification of chemical objects. Theses in .docx yielded semantically rich SciXML that enabled the additional extraction of associated data. Chemical objects were placed in a data repository, and RDF triples deposited in a triplestore.
Data-mining from chemistry e-theses is both desirable and feasible; but the use of PDF, the de facto format standard for deposit in most repositories, prevents the optimal extraction of data for semantic querying. In order to facilitate this, we recommend that universities also require deposition of chemistry e-theses in an XML document format. Further work is required to clarify the complex IPR issues and ensure that they do not become an unwarranted barrier to data extraction and re-use
SPECTRa-T Final Report July 2008
Much of the experimental data generated by postgraduate researchers in chemistry and related departments are conventionally reported in theses. Although such theses might describe up to 50 novel chemical syntheses, with full characterisation of synthesised compounds, much of this is not communicated in peer-reviewed publication to the scientific community in an appropriate form (numbers are reduced to points on diagrams, tables are converted to graphs in pixel form) and a significant proportion of preparative procedures (anecdotally estimated at 80%) are never formally submitted at all. Although the bare outline essentials of the synthesis are published, the detailed experimental recipes (as found in the thesis) are often omitted. The SPECTRa-T (Submission Preservation Exposure of Chemistry Teaching and Research Data from Theses) project was funded as a proof-of-concept approach to develop software to automatically extract chemical terms and objects contained within electronic theses (e-theses)2. We have shown that it is possible to reliably identify organic chemical terms in both Portable Document Format (PDF)3 and Office Open XML (DOCX)4 format theses and to extract and deposit these within a Resource Description Framework (RDF) triplestore. Semantic Web standards for searching data have been developed by W3C5, and we have explored the viability of RDF-based semantic querying to enable re-use of the data contained within chemistry e-theses. Although the internal structure of PDF did not permit the identification of chemical objects (e.g. spectral assignments and physical properties), their capture from DOCX format e-theses as Chemical Markup Language CML6 data files was achieved. These files were deposited in APP-enabled7 data repositories, each being URI-linked to a searchable named chemical entity in the RDF triplestore. We have demonstrated: • routine and automatic extraction of Chemical Objects (e.g. molecules, spectra) and named chemical entities in high volumes, transformation into metadata and their capture into data repositories and triplestores. • exploration of the viability of RDF-based semantic querying. • review of current document format practice in the deposition of chemistry theses and how this influences ease of data extraction This machine-based identification of chemical terms was achieved using modified OSCAR3 processing software8 which, in part using the ChEBI chemistry ontology9, is specific to ‘small molecule’ organic structures typically found in synthetic organic chemistry theses. The need to develop other chemistry-domain ontologies is indicated. SPECTRa-T was funded by JISC's Digital Repositories Programme as a joint project between Cambridge University Library and the chemistry departments of the University of Cambridge and Imperial College London
Recommended from our members
The SPECTRa Project: A Wider Chemistry View
This presentation was delivered to the "eBank/R4L/SPECTRa joint consultation workshop: Exploring the eCrystals Federation Model". eBank is a research programme which had run two successful projects (funded by JISC) at the time of the workshop. R4L and SPECTRa were in progress when the presentations were delivered. The aim of the project was to help develop the terms of reference for the next stage of the eBank programme; the formation of a federation of digital repositories holding crystallography data.The SPECTRa project is a collaboration between Imperial College London and the University of Cambridge investigating needs, attitudes and solutions to depositing chemistry data in institutional digital repositories. Because of it's departmental context and it's attention to three different chemistry specialisms, the SPECTRa project covers ground not previously covered (especially by the eBank projects).
Tonge and Downing explain the cultural and technical problems raised, and propose some solutions being developed by the project.JISC
Imperial College London
University of Cambridg
The impact of Covid-19 on the mental health of professional footballers
The Covid-19 pandemic has had huge ramifications on professional football. This commentary focuses on the impact of the pandemic on the mental health of professional footballers. Specifically, footballers within the English Premier League, English Football League, FA Women’s Super League and FA Women’s Championship. This commentary considers a holistic approach to mental health, the environment of professional football, and the impact of career transitions and critical moments on mental health. The intention is to stimulate discussion and further research of mental health and wellbeing within professional football. This paper considers the impact of Covid-19 and makes recommendations for professional football clubs to develop a holistic mental health strategy. We recommend that professional clubs increase the level of emotional support for professional footballers, and that this should not be a temporary measure due to the pandemic. Clubs should develop a long-term strategy to encourage players to seek emotional support
Facilitating the deposit of experimental chemistry data in institutional repositories: Project SPECTRa
Institutional Open Access repositories are becoming established as an important part of the university library and information services infrastructure. While early efforts to populate them with content have concentrated on the deposit of peer-reviewed research papers, there is a growing awareness of their potential as repositories of data and other non-text materials, and consequently a need to develop strategies and procedures that can realise this potential. Chemistry as a discipline has been slower than the physical and biomedical sciences to adopt and exploit Open Access concepts in the handling of experimental data and research publications. Chemical information is essential to many sciences outside chemistry, and the reporting of the synthesis and properties of new chemical compounds is central to this. But most of the essential experimental data associated with peer-reviewed publications from chemistry departments are never communicated to the scientific community. These data are all available in high-quality electronic form in the laboratories but there is no effective method for archiving them or making them openly accessible. The SPECTRa (Submission, Preservation, and Exposure of Chemistry Teaching and Research Data) project addressed this problem. It was a JISC-funded 18-month collaboration, ending in March 2007, between the university libraries and chemistry departments of the University of Cambridge and Imperial College London, in co-operation with the eBank-UK project. Its main objective was to develop a set of customized software tools that would enable chemists routinely to deposit experimental data in Open Access repositories, employing the DSpace repository platform used by the two libraries. The work was informed by surveys of research chemists in the two universities, exploring their use of information technology and assessing their interest in using repositories and Open Access principles for data management. This paper presents the project\u27s outcomes and discusses the implications for the development of library-managed institutional repositories